10 research outputs found

    Robust Mobile Malware Detection

    Get PDF
    The increasing popularity and use of smartphones and hand-held devices have made them the most popular target for malware attackers. Researchers have proposed machine learning-based models to automatically detect malware attacks on these devices. Since these models learn application behaviors solely from the extracted features, choosing an appropriate and meaningful feature set is one of the most crucial steps for designing an effective mobile malware detection system. There are four categories of features for mobile applications. Previous works have taken arbitrary combinations of these categories to design models, resulting in sub-optimal performance. This thesis systematically investigates the individual impact of these feature categories on mobile malware detection systems. Feature categories that complement each other are investigated and categories that add redundancy to the feature space (thereby degrading the performance) are analyzed. In the process, the combination of feature categories that provides the best detection results is identified. Ensuring reliability and robustness of the above-mentioned malware detection systems is of utmost importance as newer techniques to break down such systems continue to surface. Adversarial attack is one such evasive attack that can bypass a detection system by carefully morphing a malicious sample even though the sample was originally correctly identified by the same system. Self-crafted adversarial samples can be used to retrain a model to defend against such attacks. However, randomly using too many such samples, as is currently done in the literature, can further degrade detection performance. This work proposed two intelligent approaches to retrain a classifier through the intelligent selection of adversarial samples. The first approach adopts a distance-based scheme where the samples are chosen based on their distance from malware and benign cluster centers while the second selects the samples based on a probability measure derived from a kernel-based learning method. The second method achieved a 6% improvement in terms of accuracy. To ensure practical deployment of malware detection systems, it is necessary to keep the real-world data characteristics in mind. For example, the benign applications deployed in the market greatly outnumber malware applications. However, most studies have assumed a balanced data distribution. Also, techniques to handle imbalanced data in other domains cannot be applied directly to mobile malware detection since they generate synthetic samples with broken functionality, making them invalid. In this regard, this thesis introduces a novel synthetic over-sampling technique that ensures valid sample generation. This technique is subsequently combined with a dynamic cost function in the learning scheme that automatically adjusts minority class weight during model training which counters the bias towards the majority class and stabilizes the model. This hybrid method provided a 9% improvement in terms of F1-score. Aiming to design a robust malware detection system, this thesis extensively studies machine learning-based mobile malware detection in terms of best feature category combination, resilience against evasive attacks, and practical deployment of detection models. Given the increasing technological advancements in mobile and hand-held devices, this study will be very useful for designing robust cybersecurity systems to ensure safe usage of these devices.Doctor of Philosoph

    Heuristic for Lowering Electricity Costs for Routing in Optical Data Center Networks

    Get PDF
    Optical data centers consume a large quantity of energy and the cost of that energy has a significant contribution to the operational cost in data centers. The amount of electricity consumption in data centers and their related costs are increasing day by day. Data centers are geographically distributed all around the continents and the growing numbers of data replicas have made it possible to find more cost effective network routing. Besides flat-rate prices, today, there are companies which offers real-time pricing. In order to address the energy consumption cost problem, we propose an energy efficient routing scheme to find least cost path to the replicas based on real-time pricing model called energy price aware routing (EPAR). Our research considers anycast data transmission model to find the suitable replica as well as the fixed window traffic allocation model for demand request to reduce the energy consumption cost of data center networks

    Mobile malware detection with imbalanced data using a novel synthetic oversampling strategy and deep learning

    No full text
    Mobile malware detection is inherently an imbalanced data problem since the number of benign applications in the market is far greater than the number of malicious applications. Existing methods to handle imbalanced data, such as synthetic minority over-sampling, do not translate well into this domain since mobile malware detection generally deals with binary features and these methods are designed for continuous features. Also, methods adapted for categorical features cannot be applied here since random modifications of features can result in invalid sample generation. In this work, we propose a novel technique for generating synthetic samples for mobile malware detection with imbalanced data. Our proposed method adds new data points in the sample space by generating synthetic malware samples which also preserves the original functionality of the malicious apps. Experiments show that the proposed approach outperforms existing techniques in terms of precision, recall, F1score, and AUC. This study will be useful in building deep neural network-based systems to handle imbalanced data for mobile malware detection. © 2020 IEEE

    Malware detection in edge devices with fuzzy oversampling and dynamic class weighting

    No full text
    In Internet-of-things (IoT) domain, edge devices are used increasingly for data accumulation, preprocessing, and analytics. Intelligent integration of edge devices with Artificial Intelligence (AI) facilitates real-time analysis and decision making. However, these devices simultaneously provide additional attack opportunities for malware developers, potentially leading to information and financial loss. Machine learning approaches can detect such attacks but their performance degrades when benign samples substantially outnumber malware samples in training data. Existing approaches for such imbalanced data assume samples represented as continuous features and thus can generate invalid samples when malware applications are represented by binary features. We propose a novel malware oversampling technique that addresses this issue. Further, we propose two approaches for malware detection. Our first approach uses fuzzy set theory, while the second approach dynamically assigns higher priority to malware samples using a novel loss function. Combining our oversampling technique with these approaches, the proposed approach attains over 9% improvement over competing methods in terms of F1_score. Our approaches can, therefore, result in enhanced privacy and security in edge computing services. © 2021 Elsevier B.V

    Mobile malware detection : an analysis of deep learning model

    No full text
    Due to its widespread use, with numerous applications deployed everyday, smartphones have become an inevitable target of the malware developers. This huge number of applications renders manual inspection of codes infeasible; as such, researchers have proposed several malware detection techniques based on automatic machine learning tools. Deep learning has gained a lot of attention from the malware researchers due to its ability of capture complex relationships among inputs and outputs. However, deep learning models depend largely on several hyper-parameters (i.e., learning rate, batch size, dropout rate). Hence, it is of utmost importance to analyze the effect of these parameters on classifier performance. In this paper, we systematically studied the effect of these parameters along with the effect of network architecture. We showed that building arbitrary deep networks does not always improve classifier performance. We also determined the combination of hyper-parameters that yields best result. This study will be useful in building better deep neural network based model for malware classification

    Mobile malware detection: An analysis of deep learning model

    No full text
    Imam, T ORCiD: 0000-0002-8864-4155© 2019 IEEE. Due to its widespread use, with numerous applications deployed everyday, smartphones have become an inevitable target of the malware developers. This huge number of applications renders manual inspection of codes infeasible; as such, researchers have proposed several malware detection techniques based on automatic machine learning tools. Deep learning has gained a lot of attention from the malware researchers due to its ability of capture complex relationships among inputs and outputs. However, deep learning models depend largely on several hyper-parameters (i.e., learning rate, batch size, dropout rate). Hence, it is of utmost importance to analyze the effect of these parameters on classifier performance. In this paper, we systematically studied the effect of these parameters along with the effect of network architecture. We showed that building arbitrary deep networks does not always improve classifier performance. We also determined the combination of hyperparameters that yields best result. This study will be useful in building better deep neural network based model for malware classification

    Robust malware defense in industrial IoT applications using machine learning with selective adversarial samples

    Get PDF
    Imam, T ORCiD: 0000-0002-8864-4155Industrial Internet of Things (IIoT) deploys edge devices to act as intermediaries between sensors and actuators and application servers or cloud services. Machine learning models have been widely used to thwart malware attacks in such edge devices. However, these models are vulnerable to adversarial attacks where attackers craft adversarial samples by introducing small perturbations to malware samples to fool a classifier to misclassify them as benign applications. Literature on deep learning networks proposes adversarial retraining as a defense mechanism where adversarial samples are combined with legitimate samples to retrain the classifier. However, existing works select such adversarial samples in a random fashion which degrades the classifier's performance. This work proposes two novel approaches for selecting adversarial samples to retrain a classifier. One, based on the distance from malware cluster center, and the other, based on a probability measure derived from a kernel based learning (KBL). Our experiments show that both of our sample selection methods outperform the random selection method and the KBL selection method improves detection accuracy by 6%. Also, while existing works focus on deep neural networks with respect to adversarial retraining, we additionally assess the impact of such adversarial samples on other classifiers and our proposed selective adversarial retraining approaches show similar performance improvement for these classifiers as well. The outcomes from the study can assist in designing robust security systems for IIoT applications

    Selective adversarial learning for mobile malware

    No full text
    Imam, T ORCiD: 0000-0002-8864-4155Machine learning models, including deep neural networks, have been shown to be vulnerable to adversarial attacks. Adversarial samples are crafted from legitimate inputs by carefully introducing small perturbation to the input so that the classifier is fooled. Adversarial retraining, which involves retraining the classifier using adversarial samples, has been shown to improve the robustness of the classifier against adversarial attacks. However, it has been also shown that retraining with too many samples can lead to performance degradation. Hence, a careful selection of the adversarial samples that are used to retrain the classifier is necessary, yet existing works select these samples in a randomized fashion. In our work, we propose two novel approaches for selecting adversarial samples: based on the distance from cluster center of malware and based on the probability derived from a kernel based learning (KBL). Our experiment results show that both of our selective mechanisms for adversarial retraining outperform the random selection technique and significantly improve the classifier performance against adversarial attacks. In particular, selection with KBL delivers above 6% improvement in detection accuracy compared to random selection. The method proposed here has greater impact in designing robust machine learning system for security applications

    Mobile malware detection with imbalanced data using a novel synthetic oversampling strategy and deep learning

    No full text
    Mobile malware detection is inherently an imbalanced data problem since the number of benign applications in the market is far greater than the number of malicious applications. Existing methods to handle imbalanced data, such as synthetic minority over-sampling, do not translate well into this domain since mobile malware detection generally deals with binary features and these methods are designed for continuous features. Also, methods adapted for categorical features cannot be applied here since random modifications of features can result in invalid sample generation. In this work, we propose a novel technique for generating synthetic samples for mobile malware detection with imbalanced data. Our proposed method adds new data points in the sample space by generating synthetic malware samples which also preserves the original functionality of the malicious apps. Experiments show that the proposed approach outperforms existing techniques in terms of precision, recall, F1score, and AUC. This study will be useful in building deep neural network-based systems to handle imbalanced data for mobile malware detection

    Malware detection in edge devices with fuzzy oversampling and dynamic class weighting

    No full text
    In Internet-of-things (IoT) domain, edge devices are used increasingly for data accumulation, preprocessing, and analytics. Intelligent integration of edge devices with Artificial Intelligence (AI) facilitates real-time analysis and decision making. However, these devices simultaneously provide additional attack opportunities for malware developers, potentially leading to information and financial loss. Machine learning approaches can detect such attacks but their performance degrades when benign samples substantially outnumber malware samples in training data. Existing approaches for such imbalanced data assume samples represented as continuous features and thus can generate invalid samples when malware applications are represented by binary features. We propose a novel malware oversampling technique that addresses this issue. Further, we propose two approaches for malware detection. Our first approach uses fuzzy set theory, while the second approach dynamically assigns higher priority to malware samples using a novel loss function. Combining our oversampling technique with these approaches, the proposed approach attains over 9% improvement over competing methods in terms of F1_score. Our approaches can, therefore, result in enhanced privacy and security in edge computing services
    corecore